Genome-Wide Libraries for Protozoan Pathogen Drug Target Screening Using Yeast Surface Display

The lack of genetic tools to manipulate protozoan pathogens has limited the use of genome-wide approaches to identify drug or vaccine targets and understand these organisms’ biology. We have developed an efficient method to construct genome-wide libraries for yeast surface display (YSD) and developed a YSD fitness screen (YSD-FS) to identify drug targets. We show the efficacy of our method by generating genome-wide libraries for Trypanosoma brucei, Trypanosoma cruzi, and Giardia lamblia parasites. Each library has a diversity of ∼105 to 106 clones, representing ∼6- to 30-fold of the parasite’s genome. Nanopore sequencing confirmed the libraries’ genome coverage with multiple clones for each parasite gene. Western blot and imaging analysis confirmed surface expression of the G. lamblia library proteins in yeast. Using the YSD-FS assay, we identified bonafide interactors of metronidazole, a drug used to treat protozoan and bacterial infections. We also found enrichment in nucleotide-binding domain sequences associated with yeast increased fitness to metronidazole, indicating that this drug might target multiple enzymes containing nucleotide-binding domains. The libraries are valuable biological resources for discovering drug or vaccine targets, ligand receptors, protein–protein interactions, and pathogen–host interactions. The library assembly approach can be applied to other organisms or expression systems, and the YSD-FS assay might help identify new drug targets in protozoan pathogens.

The line shows linear regression comparing both libraries.Reads were aligned to the genome using the minimap2 tool, and read counts per gene were obtained using the featureCounts tool (from package Subread).The raw counts were analyzed using the package EdgeR to obtain normalized read counts per million, and Pearson's coefficient of correlation was calculated using RStudio (Posit).The graph was prepared using GraphPad Prism (GraphPad Software Inc).Scripts used for analysis are available below.See also Sternlieb et al. (26) protocol for additional information on computational analysis.
Table S1.Giardia lamblia library transformation in Saccharomyces cerevisiae EBY100.Fifteen transformations were performed with Gl-lib by electroporation in three experimental groups (five transformations per group).Each transformation was performed with 100 ng of library DNA. a Calculated clones obtained by combining all 15 transformations.b Library size represents the number of clones obtained after transforming Escherichia coli with Gibson-assembled genomic fragments in pYD1.c Foldchange comparing the number of clones (from yeast transfection) vs library size.d Mean and SDM of Exp.groups mean 1,2, and 3. Exp., Experimental.SDM, standard deviation of the mean.Scripts for computational data analysis

Exp
The scripts below were submitted to a cluster from Compute Canada (www.computecanada.ca/)using Linux Ubuntu in Windows Subsystem for Linux.A general file name is given for simplicity, e.g., "GenomeOfReference.fasta"for reference genome, or "reads-map_v1.sam" for mapped output file.
1. Decompressing fastq files and mapping the reads to the genome using minimap2.
#bamCoverage generates a coverage track by normalizing and binning the reads, producing a *.bw file.The file can be analyzed in a genome visualization tool (e.g., integrated genome viewer such as https://igv.org/app/)for visual validation and to check for potential bias in library coverage.
5. Generate graphs for visualization of read coverage using circular plot in R.

})
#This adds a text in the middle of the circle text(0, 0, "Genome\ncoverage", cex = 1.5) 6. Libframe analysis to calculate the predicted peptide lengths and amino acid sequences.
Note.The Libframe tool used in this step was developed in python and is used for pYD1.If using a different expression system, the code can be modified to replace the Xpress tag sequence with any sequence that should be in-frame with the cloned fragments.The tool with code assessable is available at https://github.com/cestari-lab/Libframe-tool.To access the code, open the file using a text editor.
#The libframe script finds the Xpress tag sequence in the reads and translates everyhting in frame after the end of the tag and until a STOP codon is found.

Figure S1 .
Figure S1.Enrichment of reads aligning to large repetitive gene families.Top, read coverage (in black) over a segment of G. lamblia chromosome (Chr) 2. Overly enriched sequences (read coverage peaks) map to variant surface proteins (VSPs).Middle, read coverage (in black) over a segment of T. brucei subtelomeric region containing variant surface glycoprotein genes (VSGs) and pseudogenes.Bottom, read coverage (in black) over a segment T. cruzi Chr 44 showing enriched sequences mapping to the subtelomeric region containing dispersed repetitive gene families.Genes are indicated by gray bars/rectangles.

Figure S2 .
Figure S2.Flow cytometry analysis of non-induced or induced library.Plots show the yeast cells transformed with pYD1 or Gl-lib (i.e., pYD1-G.lamblia library) stained or not with monoclonal antibodies against the Xpress epitope.Yeast cultures were grown in a medium containing 1, 2, or 3% galactose to induce library expression or non-induced (repressed) with 2% glucose (Glu).The percentage (%) of positive populations is shown in the graph.SSC-A, side scatter parameter.

Figure S3 .
Figure S3.Correlation of drug-treated versus non-transformed library.Scatter plot of reads counts per million (CPM) of G. lamblia non-transformed library compared to yeast expressing Gl-Lib grown in the presence of metronidazole.The Pearson coefficient of correlation (Corr) comparing datasets is shown.The line shows linear regression comparing both libraries.Reads were aligned to the genome using the minimap2 tool, and read counts per gene were obtained using the featureCounts tool (from package Subread).The raw counts were analyzed using the package EdgeR to obtain normalized read counts per million, and Pearson's coefficient of correlation was calculated using RStudio (Posit).The graph was prepared using GraphPad Prism (GraphPad Software Inc).Scripts used for analysis are available below.See alsoSternlieb et al. (26)  protocol for additional information on computational analysis.

#
Reading data from the plotCoverage output file (reads per base)